Support Of Full Resolution Graphics, Menus, And Subtitles In Frame Compatible 3D Delivery

ABSTRACT

Full resolution graphic overlays (e.g., graphics, menus, arrows, buttons, captions, banners, picture in picture information) and subtitles in frame compatible  3 D delivery for a scalable system are described.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Patent Provisional ApplicationNo. 61/223,027, filed on 4 Jul. 2009, and U.S. Patent ProvisionalApplication No. 61/237,150, filed 26 Aug. 2009, both hereby incorporatedby reference in each entireties.

TECHNOLOGY

The present disclosure relates to scalable 3D video applications. Morein particular, it relates to a method for embedding subtitles and/orgraphic overlays in a scalable 3D video application.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a diagram of a multi-layered 3D coding system.

FIG. 2 shows a side by side packing of a video image.

FIG. 3 shows an over-under packing of a video image.

FIG. 4 shows a conventional way of embedding subtitles in a side by sidepacked video image.

FIG. 5 shows a conventional way of embedding subtitles in an over-underpacked video image.

FIG. 6 shows a conventional way of embedding both subtitles and graphicoverlays in a side by side packed video image.

FIG. 7 shows a conventional way of embedding both subtitles and graphicoverlays in a over-under packed video image.

FIG. 8 shows a base layer image of a side by side frame packingarrangement.

FIG. 9 shows the enhancement layer image of a side by side frame packingarrangement.

FIG. 10 shows a base layer with subtitles according to an embodiment ofthe present disclosure.

FIG. 11 shows an enhancement layer with subtitles according to anembodiment of the present disclosure.

FIG. 12 shows an embodiment of the present disclosure, where support ofsubtitles and/or graphics overlays in a scalable, full resolution, framecompatible 3D system is shown. Subtitles and/or graphic overlays areadded separately, with the appropriate offsets, in each layer and beforemultiplexing the data into separate, left and right views.

FIG. 13 shows a further embodiment of the present disclosure, whereoverlay generation for the one or more enhancement layers is provided bya prediction module associated with the base layer overlay generator.

FIG. 14 shows the final left view with subtitle text (after remixing).

FIG. 15 shows the final right view with subtitle text (after remixing).

DESCRIPTION OF EXAMPLE EMBODIMENTS

The present disclosure describes systems and methods supporting fullresolution graphic overlays (e.g., graphics, menus, arrows, buttons,captions, banners, picture in picture information) and subtitles inframe compatible 3D delivery for a scalable system.

According to a first aspect, a method for embedding subtitles and/orgraphic overlays in a frame compatible 3D video encoding systemcomprising a base layer and at least one enhancement layer is provided,the method comprising: providing the subtitles and/or graphic overlaysseparately for the base layer and the at least one enhancement layer.

According to a second aspect, a method for embedding subtitles and/orgraphic overlays in a frame compatible 3D video encoding systemcomprising a plurality of layers and at least one enhancement layer isprovided, the method comprising: providing the subtitles and/or graphicoverlays separately for each layer, wherein the subtitles and/or graphicoverlays provided from some layers are predicted from the subtitlesand/or graphic overlays provided by one or more other layers.

According to a third aspect, a system for embedding subtitles and/orgraphic overlays in a frame compatible 3D video scalable systemcomprising a base layer and one or more enhancement layers is provided,the system comprising: a base layer subtitles and/or graphic overlaysgenerator; and one or more enhancement layer subtitles and/or graphicoverlays generators for the respective one or more enhancement layers.

According to a fourth aspect, a system for embedding subtitles and/orgraphic overlays in a frame compatible 3D video scalable systemcomprising a base layer and one or more enhancement layers is provided,the system comprising: a base layer subtitles and/or graphic overlaysgenerator; a predictor connected with the base layer subtitles and/orgraphic overlays generator, the predictor processing the base layersubtitles and/or graphic overlays and generating enhancement layersubtitles and/or graphic overlays for the one or more enhancementlayers.

Scalable systems comprise multiple layers, a base and several (one ormore) enhancement layers, where the base layer can enable a firstrepresentation of the video signal when decoded. The base layerrepresentation, in this scenario, is based on frame multiplexing of twostereo views, e.g. side by side or over under (frame compatible 3D), andis essentially of half resolution given the sampling process for eachstereo view. The additional enhancement layers, if available anddecoded, allow for further quality enhancement and essentially of thereconstruction of the full resolution signal for both views. Suchsystems are described in U.S. Provisional Application No. 61/223,027,filed on Jul. 4, 2009, incorporated herein by reference in its entirety.

The teachings of the present disclosure can be applied to videoauthoring systems, video encoders and decoders such as Blu-ray players,set-top boxes, software players etc, displays, and encoder/decoderchips. A video authoring system is a tool that allows the editing andcreation of a DVD, Blu-ray, or other multimedia storage format,including online multimedia formats. The editing process may include anymodifications to the video and audio signals, such as cropping, scaling,creation of different transitions etc, placement of video clips atdifferent time intervals, and creation of menus, graphics, and subtitlesin different languages among others.

In accordance with embodiments of the present disclosure, 3D videocontent can be provided to consumers using a scalable video codingsystem consisting of multiple layers, such as a base layer and one ormore enhancement layers as described in Annex A, which forms part of thespecification of the present application. In the base layer 3D videoinformation from two separate, subsampled, views is multiplexed togetherusing a variety of arrangements, such as side by side, line interleaved,or over-under, among others, into a single frame.

Subsampling may have occurred using a variety of sampling methods, suchas horizontal, vertical, and quincunx among others. The multiplexedframe in this layer has essentially very similar characteristics to a 2Dvideo frame and can be encoded using conventional methods such as videocoding standards and codecs like MPEG-2, MPEG-4 AVC/H.264, and VC-1among others. This layer can be decoded using single decoder systemswithout any other hardware assistance, and using appropriate displaydevices such as micropolarized displays, enable a viewer to experience a3D movie, even though at a reduced resolution.

As shown in Appendix A, using the enhancement layer or layers, however,of this system, one can enable the reconstruction of the full resolution3D signal. Essentially, the enhancement layer or layers contain themissing information from the base layer, such as samples or frequencyinformation, that were lost during the creation of the base layer. Forefficiency purposes, the enhancement layer or layers use the base,and/or previously encoded enhancement layers, as a predictor since thereexists very high correlation between the current enhancement layersamples and other layer samples. The process may include additionalmechanisms that can further increase correlation, such as interpolationfilters, motion estimation and compensation, and weighted predictionamong others. At the decoder, after the reconstruction of theenhancement layer, an additional process that combines that data of thebase layer with the data of the enhancement layer or layers is performedin order to reconstruct the full resolution 3D images. The entireprocess is shown in FIG. 1 of the present application, which is alsodescribed in U.S. Provisional Application No. 61/223,027, filed on Jul.4, 2009, incorporated herein by reference in its entirety. See, inparticular, FIG. 11 and related portions of the specification.

Although video information is of the highest importance in this system,other information can also be of high importance and can affect the 3Dexperience of a user. In particular, it may be desirable to provide tothe user graphic overlay and/or subtitle information, including 3Dsubtitles, or highlight certain content on the video using appropriategraphics information that may be associated with the video. This isespecially true if the video content is to be packaged on a media devicesuch as a DVD or Blu-ray disc, or even if delivered over the Internet, acable, or a satellite system. A user would expect that suchfunctionalities, including the presence and ability to navigate throughappropriate 2D and even 3D menus, would be available when using only thebase layer or when using all available layers.

For the base layer, the simplest method of providing suchfunctionalities is to create graphic overlays and/or subtitles whileconsidering the frame packing method, e.g. side by side (see FIGS. 2, 4and 6) or over-under (see FIGS. 3, 5 and 7), during the authoringprocess.

According to an embodiment of the present disclosure, the contentcreator authors the content by considering the 3D video format used, andreplicates this information for each segment, where segment hererepresents the area in the frame that corresponds to a certain view,i.e. the left or right view. These graphics may also be rendered while,optionally, considering depth information allowing further flexibilityto the content creator. For example, a different offset in the subtitletext associated with the left segment vs. the subtitle text associatedwith the right segment creates the illusion to a viewer that the text isat a different depth level versus other information in the video signal.It is in fact possible to assign, by modifying such offsets, differentdepth to different objects within a scene.

Although this has already been done for frame compatible signals such asside by side (FIGS. 2 and 4) and over under packed (FIGS. 3 and 5)information, it is highly desirable that the same functionality is alsoretained when the multi-layered system discussed previously and enclosedin Annex A is used, in accordance with embodiments of the presentdisclosure.

Even though one method of adding such information could be the additionof separate graphics engines after the reconstruction of the fullresolution images, this makes the design of the system more expensiveand less flexible since it would imply additional subtitle tracks arepresent within the video, given the desire to preserve the base layergraphics information which is formatted differently, and that additionalcontrol and processing is present in the system, making it moreexpensive to implement such a solution. A different method, which againmay be too complex, would be to reprocess the base layer information andextract the graphics for the left and right images separately and addthem back to the full resolution images.

Instead, in accordance with the teachings of the present disclosure, asimpler method is presented that enables full resolution reconstructionalso of graphics information, without significantly penalizing thedesign of the system.

In particular, instead of adding the graphics elements directly on thefinal, reconstructed left and right images, graphics elements are addedseparately on both the base and enhancement layer information prior tothe final view reconstruction process. This implies that graphics areagain added on top of these layers according to the packing arrangementused for the video signal. More specifically, if the video signal isgenerated using the side by side packing arrangement, graphics (e.g.subtitles, captions etc) are created using the same arrangement andadded on both the base and enhancement layers separately.

An example is shown in FIG. 8 for the base layer and FIG. 9 for theenhancement layer. The final separate view images, with the appropriatefull resolution graphics (i.e., the graphics generated by combining thebase and enhancement layer graphics similar to how the actual images arealso synthesized), are synthesized by performing the view reconstructiononly after all graphics were added on both images, as also shown in FIG.10 and FIG. 11.

The system and method according to the present disclosure are shown inFIG. 12 where after the base or enhancement layer is decoded, theappropriate graphics (e.g., interactive graphics IG and/or presentationgraphics PG) are also created and added on top of the video data. Then,the new video data, with the overlayed graphics, are multiplexedtogether to generate the final, separate, 3D images, as is also shown inFIG. 14 and FIG. 15.

Turning to the description of FIG. 12, it should be noted that theoverlay generators (710), (720), (730) can be provided at locations(1110), (1120), (1130), respectively, of the system shown in FIG. 1. Inparticular, as shown in FIG. 12, overlay generators (710), (720), (730)act on graphic planes (740), (750), (760), respectively, at the outputof video decoders (770), (780) and (790). Therefore, in accordance withembodiments of the present disclosure, subtitles and/or graphic overlaysare provided separately for each of the base layer and at least oneenhancement layer.

Moreover, according to a further embodiment of the present disclosure,generation of subtitles and/or overlay graphics for the enhancementlayer or layers can be provided by interpolating the base layer data, asalso noted later in the present disclosure.

According to embodiments of the present disclosure, the differentsampling performed for the base vs. the enhancement layer is also takeninto account. In particular, for the base layer and for side by sidepacking, the left view may have been sampled by skipping every otherhorizontal pixel starting from column 0, while the right view may havebeen sampled by skipping every other horizontal pixel starting fromcolumn −1. On the other hand, sampling for the enhancement layer isreversed, i.e. sampling starting from column −1 for the left view andcolumn 0 for the right view. Given these characteristics of the base andenhancement layers it would be desirable that graphics are also sampledusing exactly the same method.

Additionally, in some systems, sampling of subtitles and/or graphicoverlays can be done by disabling anti-aliasing and/or filtering toallow the subtitles and/or graphic overlays to be sampled using the samesampling method for the base and enhancement layers, which will ensurethat the full resolution reconstruction of the graphics has not lost anyinformation.

In a different embodiment, it could be possible that the enhancementlayer graphics data are predicted or reconstructed, in a similar way tothe video data, from those of the base layer. In particular, instead ofhaving to send the information multiple times, in both base andenhancement, the data may only be present in the base layer. However,both base and enhancement layer graphics units or overlay generators(710), (720), (730) of FIG. 12 can use the same data to generate orsynthesize the graphics overlay information, such as subtitle text,without having to perform any additional rendering. The base andsynthesized enhancement layer graphics overlays are then added to thebase and enhancement video layers respectively.

In a separate embodiment the enhancement layer unit or units (720),(730) may perform additional processing, e.g. different filtering orinterpolation/sampling, to generate the graphics using a differentsampling of those of the base layer, without having to render thegraphics overlay separately. For example, the enhancement layer graphicsoverlay may be generated by simply copying the data from the base layeror by interpolating the base layer data using a horizontal interpolationfilter such as the H.264 six tap interpolation filter, bilinearinterpolation, bicubic or lanczos interpolation.

A further embodiment of the present disclosure is shown in FIG. 13,where a predictor module (895) connected with a base layer overlaygenerator (810) is shown, and where the predictor module (895) operatesas an overlay generator for the one or more enhancement layers. Ifdesired, the predictor module (895) can perform interpolation of thebase layer data and provide the interpolated data to the enhancementlayers.

According to a further embodiment, in the case of a system havingmultiple layers, prediction can be done from a certain layer or layers.In other words, N layers are generated, M layers are predicted. This canbe especially true for the case of multiview coding.

The graphics that could be added may include subtitle information,captions, buttons, arrows, and other graphics, but could also includetextures and/or images. These graphics could be stationary or moving, 2Dand 3D. In a special case, this may involve the addition of aPicture-in-Picture signal where the decoder may wish to overlay adifferent program on top of a 3D video. In this case, this video wouldhave to be rendered properly on both left and right views. This impliesthat for the base layer, the signal would have to be sampledappropriately (i.e. using the same sampling that was used to generatethe base layer for the video) and rendered on both the left and rightsubimages of the used frame packing arrangement, and should be overlayedon top of both base and enhancement layers.

Apart from the decoder, embodiments of the present disclosure providefor an authoring and encoding method and system which allows thecreation of such appropriate graphics information as discussed inprevious sections. Such authoring method and system may have the purposeof creating and authoring disc storage media such as a Blu-ray disc, orfor other distribution systems such as broadcast, satellite, and/or theInternet.

The teachings of the present disclosure also apply to multi-view cases,where more than two views for a scene are available.

The methods and systems described in the present disclosure may beimplemented in hardware, software, firmware or combination thereof.Features described as blocks, modules or components may be implementedtogether (e.g., in a logic device such as an integrated logic device) orseparately (e.g., as separate connected logic devices). The softwareportion of the methods of the present disclosure may comprise acomputer-readable medium which comprises instructions that, whenexecuted, perform, at least in part, the described methods. Thecomputer-readable medium may comprise, for example, a random accessmemory (RAM) and/or a read-only memory (ROM). The instructions may beexecuted by a processor (e.g., a digital signal processor (DSP), anapplication specific integrated circuit (ASIC), or a field programmablelogic array (FPGA)).

The examples set forth above are provided to give those of ordinaryskill in the art a complete disclosure and description of how to makeand use the embodiments of the method for support of full resolutiongraphics, menus and subtitles in frame compatible 3D delivery of thedisclosure, and are not intended to limit the scope of what theinventors regard as their disclosure. Modifications of theabove-described modes for carrying out the disclosure may be used bypersons of skill in the video art, and are intended to be within thescope of the following claims. All patents and publications mentioned inthe specification may be indicative of the levels of skill of thoseskilled in the art to which the disclosure pertains. All referencescited in this disclosure are incorporated by reference to the sameextent as if each reference had been incorporated by reference in itsentirety individually.

It is to be understood that the disclosure is not limited to particularmethods or systems, which can, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments only, and is not intended to belimiting. As used in this specification and the appended claims, thesingular forms “a,” “an,” and “the” include plural referents unless thecontent clearly dictates otherwise. The term “plurality” includes two ormore referents unless the content clearly dictates otherwise. Unlessdefined otherwise, all technical and scientific terms used herein havethe same meaning as commonly understood by one of ordinary skill in theart to which the disclosure pertains.

A number of embodiments of the disclosure have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the presentdisclosure. Accordingly, other embodiments are within the scope of thefollowing claims.

1. A method for embedding one or more of subtitles or graphic overlaysin a frame compatible 3D video encoding system that comprises a baselayer and at least one enhancement layer, the method comprising:providing video content associated with the subtitles or graphicoverlays separately for the base layer and the at least one enhancementlayer; embedding the subtitles or graphic overlays separately for thebase layer and the at least one enhancement layer to the video content,wherein the subtitles and/or graphic overlays provided for the at leastone enhancement layer are copies of the subtitles and/or graphicoverlays provided for the base layer or interpolated from the subtitlesand/or graphic overlays provided for the base layer, and wherein thesubtitles or graphic overlays for the base layer and the at least oneenhancement layer are at a lower resolution in relation to the content;and combining, subsequent to the embedding, the base layer and the atleast one enhancement layer to form the video content and the subtitlesand/or graphic overlays at a higher resolution, wherein the subtitlesand/or graphic overlays are embedded in the video content.
 2. The methodas recited in claim 1, wherein the one or more of the subtitles orgraphic overlays comprise depth information.
 3. The method as recited inclaim 1, wherein the base layer is sampled according to a first samplingmethod and the at least one enhancement layer is sampled according to asecond sampling method, the method further comprising: sampling thesubtitles and/or graphic overlays for the base layer according to thefirst sampling method; and sampling the subtitles and/or graphicoverlays for the at least one enhancement layer according to the secondsampling method; wherein the second sampling method obtains samples notobtained by the first sampling method.
 4. The method as recited in claim1, wherein the subtitles and/or graphic overlays are providedindependently for each layer.
 5. The method as recited in claim 1,wherein: each layer comprises at least a first view and a second view atthe lower resolution, the embedding the subtitles and/or graphicoverlays separately is for the first view and the second view at eachlayer, and the combining forms the first view, the second view, and theone or more subtitles or graphic overlays on the first view and thesecond view at the higher resolution.
 6. The method as recited in claim1 wherein the one or more embedded subtitles or graphic overlays in aframe compatible 3D video are authored according to one or more of theproviding, embedding or combining steps.
 7. A system for embedding oneor more of subtitles or graphic overlays in a frame compatible 3D videoscalable system that comprises a base layer and one or more enhancementlayers, wherein the base layer and the one or more enhancement layerscomprise video content that is associated with the subtitles or graphicoverlays, the system comprising: at least one generator that generatesone or more of base layer subtitles or graphic overlays to provide baselayer subtitles or graphic overlays that have a lower resolution inrelation to a resolution that is associated with the video content forthe base layer; at least generator that generates one or moreenhancement layer subtitles or graphic overlays for the respective oneor more enhancement layers, wherein each enhancement layer subtitles orgraphic overlays generator provides enhancement layer subtitles orgraphic overlays that have a lower resolution in relation to the videocontent for the at least one enhancement layer; and one or morecombiners configured to combine the base layer and the one or moreenhancement layers to form the video content and the subtitles and/orgraphic overlays at a higher resolution, wherein the subtitles and/orgraphic overlays are embedded in the video content; wherein thesubtitles or graphic overlays from the one or more enhancement layersubtitles or graphic overlays generators comprise copies of thesubtitles or graphic overlays that are: provided for the base layer; orinterpolated from the subtitles or graphic overlays from the base layersubtitles.
 8. The system as recited in claim 7, wherein the base layercomprises a base layer video decoder and the one or more enhancementlayers comprise respective enhancement layer video decoders and wherein:the base layer subtitles and/or graphic overlays generator operates on abase layer graphics plane at the output of the base layer video decoder;and each enhancement layer subtitles or graphic overlays generator forthe one or more enhancement layers operates on an enhancement layergraphics plane at the output of a respective enhancement layer videodecoder.
 9. The system as recited in claim 7, wherein the subtitles orgraphic overlays comprise depth information.
 10. The system as recitedin claim 7, wherein samples provided by the one or more enhancementlayer subtitles and/or graphic overlays generators are differ fromsamples provided by the base layer subtitles or graphic overlaysgenerator.
 11. The system as recited in claim 7, wherein: each of thebase layer and the one or more enhancement layers comprises at least afirst view and a second view at the lower resolution; and the one ormore combiners are configured to combine the base layer and the one ormore enhancement layers to form the first view, the second view, and thesubtitles and/or graphic overlays on the first view and the second viewat the higher resolution.
 12. A computer readable storage mediumcomprising encoded instructions, which when executed by a device or aprocessor, causes, controls, programs or configures the device orprocessor to perform a process for embedding one or more of subtitles orgraphic overlays in a frame compatible 3D video encoding system thatcomprises a base layer and at least one enhancement layer, wherein theprocess comprises the steps of: providing video content associated withthe subtitles and/or graphic overlays separately for the base layer andthe at least one enhancement layer; embedding the subtitles graphicoverlays separately for the base layer and the at least one enhancementlayer to the video content, wherein the subtitles and/or graphicoverlays provided for the at least one enhancement layer are copies ofthe subtitles or graphic overlays provided for the base layer orinterpolated from the subtitles or graphic overlays provided for thebase layer, and wherein the subtitles or graphic overlays for the baselayer and the at least one enhancement layer are at a lower resolutionin relation to the content; and combining, subsequent to the embedding,the base layer and the at least one enhancement layer to form the videocontent and the subtitles or graphic overlays at a higher resolution,wherein the higher resolution subtitles or graphic overlays are embeddedin the video content.
 13. The computer readable storage medium asrecited in claim 12, wherein the embedded subtitles or graphic overlaysare authored according to one or more of the providing, embedding orcombining steps.
 14. The computer readable storage medium as recited inclaim 13, wherein the storage medium comprises a feature or component ofone or more of a computer drive, an optical, flash or magnetic datastorage medium or device or a software product or wherein the encodedinstructions are sent, transferred, transmitted, streamed, communicatedor conducted over a component of a wireless or wire line network.
 15. Asystem for embedding one or more of subtitles or graphic overlays in aframe compatible 3D video encoding system that comprises a base layerand at least one enhancement layer, the system comprising: means forproviding video content associated with the subtitles and/or graphicoverlays separately for the base layer and the at least one enhancementlayer; means for embedding the subtitles or graphic overlays separatelyfor the base layer and the at least one enhancement layer to the videocontent, wherein the subtitles or graphic overlays provided for the atleast one enhancement layer comprise copies of the subtitles or graphicoverlays provided for the base layer or interpolated from the subtitlesor graphic overlays provided for the base layer, and wherein thesubtitles or graphic overlays for the base layer and the at least oneenhancement layer have a lower resolution in relation to the content;and means for combining, subsequent to a function of the embeddingmeans, the base layer and the at least one enhancement layer to form thevideo content and the subtitles or graphic overlays at a higherresolution, wherein the higher resolution subtitles or graphic overlaysare embedded in the video content.